A Decision Tree Based Record Linkage for Recommendation Systems

نویسندگان

  • MS. N. S. Sheth
  • A. R. Deshpande
چکیده

Record linkage merges all the records relating to the same entity from multiple datasets, at the entity level. It is the initial data preparation phase for most of the database projects. Traditionally one to one data linkage is performed among the entities of same type with common unique identifier. The proposed one to many and/or many to many record linkage method is able to link the entities of same or different types with or without availability of common unique identifier. Here a probabilistic record linkage which is based on clustering tree construction that classifies the matching entities by linkage. The tree construction is based on the one of the splitting criterion for the best attribute selection that partitions dataset at each node of the tree. Record Linkage is used in recommender system domain to produce list of recommendations at each leaf of the tree. It is used for matching new user with their product expectations in order to produce list of recommendations. In propose method a decision tree based record linkage is applied to generate book recommendations. This technique is also useful in solving cold start and new user problems. Keywords— Decision tree, Classification, Clustering, Splitting Criterion, Record Linkage, Model Based, Recommendation System.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Comparative Study in Classification Techniques for Unsupervised Record Linkage Model

Problem statement: Record linkage is a technique which is used to detect and match duplicate records which are generated in data integration process. A variety of record linkage algorithms with different steps have been developed in order to detect such duplicate records. To find out whether two records are duplicate or not, supervised and unsupervised classification techniques are utilized in ...

متن کامل

Tuning & Recommended Related Evolution Approaches for Distributed Databases

--Today’s databases are complex databases with duplicates. Due to complexity database we introduce the tuning and recommendation techniques. Tuning and recommendation process is important task in data integration task. Different existing system techniques like record matching, record linkage detects the same entities in single database. Deduplication removes the duplicates in single database. T...

متن کامل

Text Data Linkage of Different Entities Using Occt-One Class Clustering Tree

A new one to many and many to many data linkage is based on a One-Class Clustering Tree (OCCT) which characterizes the entities that should be linked together. It is evaluated using datasets of Data leakage prevention, Recommender system and Fraud detection. The tree is built such that it is easy to understand and transform into Association rules. The Data Linkage is closely related to entity r...

متن کامل

TREE AUTOMATA BASED ON COMPLETE RESIDUATED LATTICE-VALUED LOGIC: REDUCTION ALGORITHM AND DECISION PROBLEMS

In this paper, at first we define the concepts of response function and accessible states of a complete residuated lattice-valued (for simplicity we write $mathcal{L}$-valued) tree automaton with a threshold $c.$ Then, related to these concepts, we prove some lemmas and theorems that are applied in considering some decision problems such as finiteness-value and emptiness-value of recognizable t...

متن کامل

Classification with Pedigree and its Applicability to Record Linkage

Real-world data is virtually never noise-free. Current methods for handling noise do so either by removing noisy instances or by trying to clean noisy attributes. Neither of these deal directly with the issue of noise and in fact removing a noisy instance is not a viable option in many real systems. In this paper, we consider the problem of noise in the context of record linkage, a frequent pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015